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^111^ ATI Docket 000141 

APPARATUS FOR HIGH DATA RATE SYNCHRONOUS INTERFACE 

AND METHOD THEREOF 



Field of the Invention 

The present invention relates generally to a method and apparatus for 
high data rate synchronization, and more specifically to compensat.on 
techniques associated with such a method and apparatus. 



BackgroundiOhsiffi^sali^ 
Systems having high speed synchronous interfaces are subject ,0 

,™ng variations in their signals that can limit overall system performance. 
For example, a system having Single Data Rate (SDR) Synchronous 
Dynamic Random Access Memory (SDRAM) device typically mc ud. a 
n,emory controller, a printed circuit board, and a memory dev.ce (SDRAM). 
Between the memory controller =md the memory device there ,s a 
synchronous digital communication channel that is generally part of a 
printed circuit board. Timing variances occur in the signals that are 
transmitted over the interface due to static and dynamic variations. 

Factors causing static timing variations include: length variations in 
printed circuit board traces among individual traces that make up a common 
data channel or bus; length variations in wire traces among indiv.dual w>re 

substrate, such as the memory controller; propagation delay differences 
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and variations in the processing of dev.ces, espeoaily semiconductor 



devices. 



Factors causing dynamic timing variations include differences ,n 
temperature and voltage, witiclt affect the drive characteristic of 
semiconductor device components. For example, a clocic distnbut.on 
networic having a plurality of buffers or drivers distributing a specfi CocU 
signal win have a varying propagation delay as the dnve charactensttcs of 
the buffers and drivers varies with temperature and voltage. 

AH of the dynamic and static factors contribute timing variations and 
uncertainty it, high-speed synchronous interfaces. This is especially 
problematic for READ cycle where the inability to predict exactly when 
READ data from a memory will arrive at the latching elements mstde the 
n^emory controller. This uncertainty is typically accounted for with ttm.ng 
n^argin against the setup and hold recuirements of the latching devtce, 
however, increasing setup and hold margin reduces the maximum clock 
frequency. Therefore, a method and apparatus that reduces timmg vanattons 
in a system apparatus would be useful. 



pp„f nwcripti"" "ftbe Drawings 

Figure 1 illustrates, in block diagram form, an apparatus in accordance 
with a specific embodiment of the presem invention; 

Figure 2 illustrates, in timing diagram form, a timing relationships 
associated with the apparatus of Figures 1; 
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Figures 3 illustrates, in block diagram form, a portion of the apparatus 
of Figure 1 in greater detail; and 

Figure/4/illustrate, in flow diagram form, methods in accordance 
with specific embodiments of the present invention. 

One skilled in the art will appreciate that elements in the figures are 
illustrated for simplicity and clarity and have not necessarily been drawn to 
scale. For example, the dimensions of some of the elements in the figures 
are exaggerated relative to other elements to help to improve understandmg 
of embodiment(s) of the present invenfion. 



rirtailed Des-rip''"" "f drawing 



In a specific implementation of the present invention, a reference 
signal input of a delay locked loop is connected to receive a reference clock. 
The delay locked loop provides a drive clock that drives a clock distribution 
tree. One of the endpoints of the clock distribution tree is — ^ 
feedback reference of the delay locked loop. By coiB^onfe the 
endpoints to the feedback reference of the delay locked loop, the s.gnals 
received at components attached to the endpoints of the distribution tree can 
be synchronized to the reference input received at the delay locked loop. 
Synchronizing the end point signals to the reference signals allows for the 
compensation of static time variafions, due to propagafion delays of the 
clock distribution tree, and the compensation of dynamic timing var.attons 
due to the components that make up the clock distribution tree. Figure^ U 
illustrate specific embodiments of the present invention. 
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Figure 1 illustrates a system in accordance with a specific 
embodiment of the present invention. Specifically, Figure 1 illustrates a 
.egion 20 that corresponds to a portion of a semiconductor device 20 (dev.ce 
20) a region 30 that corresponds to a portion of a circuit board 30. and a 
region 40 that corresponds to a semiconductor device 40 (device 40). Each 
of the regions has substrates with associated components or features. For 
example regions 20 and 40, which are semiconductor devices, can have 
electronic components, such as the latch 25 and the controller 42. formed on 
.heir respective substrates. Note that the interfaces 27. .29. 37, 39, 129. and 
, 30 represent connections between the circuit board 30 and the 
semiconductor devices 20 and 40. For example, the interface 27 includes a 
circuit board pad. a semiconductor pad. packaging interconnects (,f any), 
and fastening materials such as solder. 



Routing paths 32 and 34 are fonned on the circuit board 30 are to 
comtect the devices 20 and 40 to one another. For example, routing path 34 
represents the trace(s) and components, if any, that provide a signal from the 
interface 29, of device 20, to the interface 39. of device 30. Connections 
between interface layers, such as vias, are considered part of the traces. In 
one embodiment, the path 34 is a single continuous trace. In an alternate 
embodiment, the path 34 can include discrete traces that are connected 
through intermediate drivers that provide a signal from interconnect 29 to 
the interconnect 39. For purposes of discussion, it will be assumed that any 
drivers or logic devices in the path 34 are combinational logic devices that 
do not require additional clocking. 
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in one implementation, the components illustrated as part of dev.ce 20 
are part of a memory controller that control access to a memory 40. It w.ll 

be appreciated, that the devices 20 and 40 can include other fttnctions ,n 
addition to those illustrated. For example, the device 20can include other 
data processing modules such a graphics driver, video driver, and aud.o 
driver The memory device 40 can be any type of memory includ.ng volatile 
or non-volatile metnories. In a specific implementation, the device 40 ,s a 
SDR SDRAM. 

During a read operation, a memory controller associated with device 
20 generates address and control signals which are provided to the memory 
device 40. Subsequem to the address and control signals being received at 
the controller 42 of the device 40, a clock signal is received at interconnect 
39 The clock signal received at interconnect 39 causes a data signal to be 
provided from the memory 40 at the interconnect 37. The data signal 
provided at interconnect 37 is received at the interconnect 27 of devtce 20 
and subsequently propagated to the data input of the latch 25. 

Figure 2 illustrates propagation delays associated with the CLOCK 
signal from device 20 to the memory device 40 back to the data input of the 
latch 25 of device 20. Specifically, a clock signal labeled CLOCK ,s 
illustrated as being generated on the device 20, and representations o the 
CLOCK signal at various physical locations are illustrated in Figure 2. For 
example, the signal labeled CK(29) represents the CLOCK signal as would 
be interpreted at the intercomrect 29, while the signal labeled CK(39) 
represents the signal as would be interpreted at the interconnect 39. 
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A propagation delay time PDO is introduced to tlte CLOCK signal as 
it propagates to the interconnect 29. The propagation time PDO includes 
static and dynamic time variations through the component 24 and ,ts 
associated traces. For example the speed of the driver component 24 can 
vary dynamically based upon differences in voltage, and temperature. 

A propagation delay time PDl occurs as the CLOCK signal 
propagates from interconnect 29 to interconnect 39. The propagation t,me 
PDl includes primarily static time variations from traces associated wtth 
path 34. The propagation time PDl also includes dynamic time variat.ons 
when path 34 includes active components. 

A propagation delay PD2 through device 40 begins once an active 
edge of a representation of the CLOCK signal is received at interconnect 39, 
and ends when the read DATA is asserted, in response to rece.vtng the 
active clock edge, on the interconnect 37. 

A propagation delay time PD3 occurs as the DATA propagates from 
interconnect 37 to interconnect 27. The propagation time PD3 includes 
primarily static time variations when path 32 includes primarily traces, and 
includes dynamic time variations when path 32 includes active components. 

A propagation delay time PD4 occurs as the read DATA propagates 
from interconnect 27 to the data input of the latch 25. The propagatton t,me 
PD4 includes both static and dynamic time variations. 

The total propagation delay associated with receiving data at the latch 
25 is equal to the sum of PDO through PD4. Because static delays can vary 
system to system, and because dynamic delays vary over time wtthm a 
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lve:tthela.ch25. In .he p.o. an, this unce.ain., is .ypicaUy — <. 

for with titning margin against the specified setup and hold tec,u— o 
.helatch25,thetebyreducingthemaxin,untobtainab,ec,oc.cfre„. n 

accordance with a specific embodiment of the present invenuon, t,. e«^. 
of the totai propagation deiay on tinting margin are removed through the use 
of a referen e Coc. signai that, in one embodiment, is deiayed by an amount 
pproximateiy e,ua, to the sum of propagation delays PDl and PD3. Ptgure 
titrates a reference Cock signal (REFCK) being received at a reference 
input (REF) of a delay locked loop (DLL). REFCK is generated usmg the 
CLOCK signal from device 20 to generate a clock having a delay stm.lar to 
the data path delay described above with reference to Figure 2. In one 
embodiment, .he delay from CLOCK is obtained by routing a represent .on 
of the CLOCK signal .hrough in.erconnec. 129, onto the circu.t board 30, 
through circuit board path 35, to interconnect 130, and finally through at. 
input buffer 14. This is illus.ra.ed in Figure 3, where CK(129) .s a 
representation of the CLOCK signal at interconnect 129, and CK(130) .s a 
representation of the CLOCK signal at intercomrect 130. The delay 
introduced between CK(129) and CK(130) is approximately equal to the 
sum of PDl and PD3. 

in a specific embodiment, the circuit board path 35 is designed to 
match the cumulative trace length of path 34 and path 32. This matchmg of 
.race lengths compensates for static variations in the printed circu.t board, 
since the path 35 will have approximately the same static varia.,on as .he 
combination of pa.hs 34 and 32. In ano.her embodimen.. the stat.c 
variations of the paths 34 and 32. if any. can be compensa.ed for by 
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matching active co.ponen. as weU as .he .ace .ength. In ye, an^he 
e— nt, one o. .ore active components can be tnCuded ,n path 35 to 
„atch static and dynastic delays through the device 40 or the path 24. For 
example, if path 34 includes a clock drWer, a similar Cock drWer can he^ 
included in path 35. In another embodiment, path 35 can be dnven by an 
output of a cock driver of path 34 instead of by a separate out^t of the 
device 20. The REFCK signal, which is a representa.ton of CLOCK can 
,hen be used to compensate for the propagation delay assoctated w.th READ 
data from the memory 30. 

,„ another embodiment of the invention, static and dynamic variations 
associated with providing a representation of the REFCK Cock signal to the 
clock input of the latch 25. For example, referring to Figure 3, a Cock 
109 is used to drive a plurality of latches, including latches 211-213. 

of latches 211-21 3, and an active clock edge received at the Cock tnput of 
the latches 21,-213, the timing margins of the latches needs to be rCaxed, 
thereby limiting the maximum obtainable Cock frequency. 

The output of the DLL 1 16 drives the Cock distribution network that 

includes paths 221-224. The dock distribution network provides clock 
signalstoeachofthelatches211-213.0neleaf(endpoint)ofthecock 

distribution network provides feedback (MCLOCK) to the feedback mput of 
the DLL, while the REFCLK signal drives the reference input of the DLL. 

The DLL functions to match the MCLOCK signal to the REFCK 
signal by adjusting the position of the clock driven onto the distribution 
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the endpoints of the clock distribution network to be approximately 
synchronous with the REFCLK. This synchronization, performed by the 
DLL compensates for dynamic delay variations of the clock distributton 
network. Also, when the REFCLK signal is selected to match at least a 
portion of the propagation delay associated with receiving data dunng a 
READ access of device 40, and the MCLOCK signal is the clock stgnal 
from an end point of the distribution network that represents when READ 
data is to be latched, the DLL is used to position the clocks at the endpomts 
of the clock distribution network to be approximately synchronous with the 
REFCLK In this manner, compensation can be made for the READ data 
propagation time and its variations, and for the clock distribution network 
clock propagation time and its variations. 

Note a well-designed clock distribution tree 1 09 provides 
approximately the same propagation delay at each endpoint, and dynamic 
time delays are equal at all endpoints. This is the reason why a smgle 
endpoim can generally represent the delay of the entire tree. 

Figure 2 illustrates the hold time (Thold) and the set up time (Tsetup) 
obtained between the rising edge of REFCK and D(Latch25). Because of 
the compensation techniques used to generate REFCK, the variance m the 
realized Thold and Tsetup times is reduced, thereby allowing for a more 
aggressive clock rate. Note, in another embodiment, the data at latch 25 can 
be latched at the falling edge of Tsetup. 

The methods of Figures 4-7 further illustrate specific embodiments of 
the present invention. 
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At step 401 of Figure 4, a dock signal is received. In one 
embodiment, the clock signal represents the reference clock that is used to 
generate subsequent clock(s) signals, for example, the signal REFCK of 
Figure 1 In this manner, the clock signal is a delayed representat.on of 
another clock signal. For example, the clock signal can be a representatton 
of the CLOCK signal of Figure 1, which is delayed by matchmg a 
propagation delay path of a READ data access as previously descrtbed. In 
another embodiment, the clock signal can be generated from an external 
clock source. 

At step 402, a distributed clock signal is provided to a clock 
distribution network having a plurality of endpoints connected ,o a 
respective plurality of components. In the example of Figure 3, the plura^>ty 
of components are latches used to latch data from a memoty during a read 
operation. Note, the DLL can also be considered one of the plurality of 
components. The number of components can vary depending upon the 
memory configuration. The clock distribution network can have a single 
node that includes a plurality of traces, or it can include a plurality of nodes 
that are driven by a plurality of active component that distribute the clock 
distribution network. 

At step 403, the distributed clock signal is modified based on a 
portion of the distributed clock signal that is received at a first end point of 
the clock distribution network. The modification to the distributed clock 
signal being driven substantially synchronizes the distributed clock stgnal 
received at the first end point of the clock distribution network to the first 
clock signal. This is accomplished when the distributed clock s.gnal 
received as feedback at the DLL is delayed from the first clock signal so that 




ATI Docket 000141 



active edges received at tite first endpoint occur at approximately tlte same 
time as the active edges of the clock signal received at the reference input of 
the DLL Note that the term "approximately the same" indicates that 
reasonable design tolerances and limitations will prevent the active edges 
from occurring at an absolute exact point in time. As illustrated in Figures 1 
and 3, a DLL can be used to delay the distribution clock signal from the first 
clock signal. 

Generally it will be desirable to select or design the first endpoint to 
be representative of each endpoint of the distribution network. By choosmg 
a representative endpoint, the signals received at each of the plurality of 
endpoints will occur at approximately the same time as the signal received at 
the first endpoint. Therefore, it is generally preferred that the trace length 
and any components associated with driving the first endpoint, is matched to 
be representative of each of the other plurality of endpoint. In one 
embodiment, assuming the distribufion network includes only traces, the 
first endpoim will be chosen to have a length that is an average or medtan 
length of the traces to all other endpoints. In one embodiment, the leaf of 
the distribution network that is fed-back to the delay element w.ll be 
specifically designed to be an average, or median representafion of all leaves 
of the distribution network. In an alterttate embodiment, a feedback loop 
can be formed by connecting to a leaf that is already connected to another 
component. 

Figure 5 illustrates an alternate method in accordance with the present 
invention. At step 501, a first clock signal is provided from a first devtce, 
such as device 20 of Figure 1. 
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At step 502, a representation of the first clock signal is received at the 
first device. The representation of the first clock signal received at the first 
device is has been propagated away from the first device. For example, the 
first clock signal can be driven from the device 20 onto the printed ctrcu.t 
board (second device), back to the device 20. 

At step 503, the representation of the first clock signal is provided to a 
delay element. The delay element described herein is a DLL; however, other 
components capable of synchronizing clock signals in the manner descrtbed 
can also be used. In the embodiment described herein, the representation of 
the first clock signal has been provided to the reference input of the DLL. 

At step 504, a delayed clock signal is provided from the delay element 
to a clock distribution tree. 

At step 505, a representation of the delayed clock signal is provided 
from a first leaf, or endpoint, of the clock distribution tree to the delay 
element. In the embodiment described herein, the delayed clock signal has 
been provided to the feedback input of the DLL. 

At step 506, the delayed clock signal provided by the delay element is 
modified based upon the representation of the delayed clock signal. In one 
embodiment, the delayed clock signal is modified to be substanhally 
synchronized to the first representation of the first clock signal. The delayed 
clock signal is synchronized to the first clock signal when it is delayed from 
the first clock signal so that active edges received a, the first leaf occur at 
approximately the same fime as the active edges of the representation of the 
representation of the delayed clock signal are received at the delay element. 
Note that the term "approximately the same" indicates that reasonable des.gn 
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absolute exact point in time. 

The — of Figu.e 6 ,mus.ates — e^bodi^ent of the present 
inventiot,. At step 601, a first Cock edge is generated at a first dev.ce a^a 
„„e. Por e.a.p,e, the first c.oe. ed.e corresponds .o an edge o .e 
.gnal labeled CLOCK, or the signal labeled CK(29) tn Ftgure 2. At s ep 
, rhefirstcloc.edgeisreceivedataseconddevlceatasecondt.nte. 

■ irKfM^ofFigure 2 corresponds to receiving the 
Nnte an edge of the signal CK(iV) oi rigui 

Note, an edg ^^^^ g^^^^^^d 

first clock edge at the second device. At step ,u,f„s, clock 

from a second device at a third time in response to receivmg the first clock 

1 Dr371 of Figure 2. At step 604, the data signal is received at 
edge, see signal D(37) ot Mgure ATrH25^ 
, L component of the first device. For example, the signal « LATCH25 

o" onds to the data being received a. a data input of latch 25 of Figure 2. 
corresponds ^ ^^1^^ 

At step 605, a representation of the first clock is p 

For example, REFCK is provided to the DLL 16ofF,gure2. Atstep606, 

21Z clock is generated from the delay component to drive a c ock 
distributed 07 a representation of the distributed clock IS 

distribution network. At step bU/, a p .u , ^ .wk is 

* 4t5tpn608 the distributed Clock IS 
received at the delay component. At step t)U6, me 

liL until the representation of the distributed clock at the first endpoint 
is synchronized with the representafion of the first clock. 

The method of Figure 7 illustrates another embodiment of the present 
invention. At step 701, a first clock is provided from a 
fitted over a first substrate to a second device where the f^.t s^.e 
...part of the first or second device ^^^o!. 

the first substrate corresponds to the circuit board 30. At step 
llsignalisprovided from thefirst device andtransmittedoverasecond 
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r, nf the first or second deviees. At step 703, the 
substrate that is not part of th ^^^^^ ^^^^^ ^^^^^ , 

second clock signal rs received at delay 

modified second clock signal. ^ ^ ' 3 , Mstep704, 

to a signal transmitted through path 3 5 ot Hg 
corresponds to a srg ^^^^^^ ^.^^^ 

a third signal is received at a g ^^^^^^^^^^^ ,l,at 

specifically the third signal has heen ^^^^ ,st 
..snotpartofthe first deviceorthesecondd^v . 

, ■ In one embodiment, the first, secona, an 
dock signal, in one e ,,tetrate 30 of Figure 1. M 

common substrates, such as the cir 

-™^-'^"T:":Ce^::-ionofthefi.^ 



the latching signal. 

.t.eforegoingspecification,thein— 

„tospecificembodiments. ^^^^^ 

departing from the scope ^ ,^,,,,ck signal that is a 

--•--^^"ft'sinleceivedattheendpointsofthe 
shifted representation of the ig a ^ ^^^^ 

distribution tree. For example, the DLL ca i„ addition, the 

,,hasefromasignalatanendp.ntot^^d— 

OLL can synchronic to a ^^^^^^^^^^^^ having a duty 

signal and provide a signal to th cloc ^^^^^ 

::%:::::::-orpassivecircuitc^^^^ 
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,1,1. will oenerally introduce additional timing 
specific delay, however, this wil genera y ^ ^^^^^ 

..iations into the system. — * " ^ecordingiy, the 

to he received at the reference ^'^^^^^ . 

'""t : p e^^^^^^^^^^^^^^ Benefits, other advantages, and 
within the scope f p es^n ^^^^^ ^^^^ ^^^^^^ ^^^^^^^^ 

=r::ie.nefits,a-^^^^^^^^ 
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